Skip to content

Accept "pipeline" to generate SearchService2 on sync processed Manifests#623

Draft
JackLewis-digirati wants to merge 32 commits into
developfrom
feature/textServiceNoIngest
Draft

Accept "pipeline" to generate SearchService2 on sync processed Manifests#623
JackLewis-digirati wants to merge 32 commits into
developfrom
feature/textServiceNoIngest

Conversation

@JackLewis-digirati

@JackLewis-digirati JackLewis-digirati commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Resolves #617

What does this change?

This PR enables pipelines in iiif-presentation as well as implementing a connection to the text services project.

Note

This PR now wraps database operations in a transaction so that failures to submit a manifest correctly rolls everything back. Additionally, as text services requires S3 to be available from the moment of submission, the S3 is saved before pipeline submission. This then needs to roll back by deleting the staging manifest from S3

Database Migration

Note

Details of migration. What does it change? Is it breaking/non-breaking?

  • What it does: adds the pipeline_jobs table that tracks jobs being submitted to text-services, while being extendable to other pipelines in the future
  • Breaking Change? No

Configuration Changes

Note

This PR introduces configuration changes.

Service AppSetting Required? Description Default
API TextServices:BuilderApiUri N The location of the text services builder null
API TextServices:BuilderApiTimeoutSeconds N Timeout for http requests to builder 5
BackgroundHandler TextServices:SearchApiUri N The location of the text services search null
BackgroundHandler TextServices:SearchApiTimeoutSeconds N Timeout for http requests to builder 30
BackgroundHandler AWS:SQS:TextJobQueueName N The SQS queue holding competed text services jobs null

@JackLewis-digirati

JackLewis-digirati commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

What this PR does

Implements the text-services pipeline integration (issue #617). When a manifest is created or updated with a pipeline property containing a text step, the API:

  1. Creates a PipelineJob record (new DB table) tracking the job status
  2. Submits the job to the text-services API via TextServicesClient
  3. Returns 202 Accepted while the job is in flight (the staged manifest is held in S3)

Once the text service finishes, a background SQS handler (TextServiceJobCompletionMessageHandler) picks up the completion message and:

  • On success: reads the staged manifest from S3, merges any SearchService2 entries from the text-augmented manifest into it, writes the final manifest to the public bucket, and marks the job as Completed
  • On failure: marks the job as Failed, deletes the staged manifest from S3

Key design decisions

PipelineJob entity follows the same ManifestId? / CollectionId? pattern as Hierarchy - nullable FKs with a check constraint ensuring exactly one is set (num_nonnulls(manifest_id, collection_id) = 1). This allows the same table to be reused for collection-level pipeline jobs in future without a schema change.

Only SearchService2 is merged from the text-augmented manifest - no other service types. The Search 2 context URL (http://iiif.io/api/search/2/context.json) is also merged into the base manifest's @context.

IManifestAugmentor is a new interface introduced to abstract operations that modify Manifests. In addition to this the ManifestS3Manager class was refactored as it was doing 2 jobs - S3 saving and merging with IIIF-CS NQ. Now that we have an additional source of Manifest content in text-service it felt appropriate to split this down, rather than bloat ManifestS3Manager. It also allowed for consistent handling of "original" Manifest.

Original manifests are saved if pipeline is specified, this follows logic introduced in #576

JackLewis-digirati and others added 13 commits June 24, 2026 16:20
Use AddDistinctById() helper as used elsewhere, updated it to return the
number of items added, change base type to make more accommodating and
add optional hook to alter item on add.

Set label on AutoComplete and SearchService if not set.

Rather than iterate and read contexts, add search2 context manually as
it's a published constant
Avoids secondary DB call when it can be done in one
Avoids fat interface where both text builder and text search clients are
implemented as one, BackgroundHandler will call both but API will only
ever call 1. Also allows for separate control of Timeouts etc per client
This keeps all the logic for building payload in client, meaning the
caller doesn't need to know how this is constructed.
This should keep it simple for future BackgroundHandler consumer to
implement
We have a known format and expectation for JobIds, this more accurately
represents it compared to a string.
* Save the status provided in message if not completed, rather than
defaulting to Failed. The assumption is it would always be Failed but
this change means we reflect the actual status and could handle new ones
* Add helper for constructing pipelines, following pattern used for
other test entities.
* Update tests to use TextJobId
Shares commonalities between different handlers
* Deserialise message and log if fail
* Helper to retry or dismiss if record not known
* Set log context
This means that only valid pipelines are passed down, making calling
logic simpler. Takes into account name and config.
* Added shorter timeout for builder, as per comment on
ManifestWriteService.RegisterAndSubmitPipelineJobs()
* Renamed CreateOrUpdateJob to UpsertJob to use consistent terminology
Rather than fetch pipeline and later fetch Manifest, do it on one
ManifestS3Manager had 2 different concerns - saving/deleting etc of
manifest in S3 and also merging manifest with content from DLCS. Now
that content can come from text-services it makes sense to split the
DLCS logic from ManifestS3Manager (int DlcsManifestMerger).
Callers can now augment Manifests separately from the ManifestS3Manager
and only use that for persistence. This avoids bloat from S3Manager
while still allowing some logic being shared (like copying of originals)
Currently 2 implementations, for DLCS and TextService. Keeps a
consistent signature for updating manifests and cleanly captures logic
for applying augmentations
@donaldgray donaldgray force-pushed the feature/textServiceNoIngest branch from 6caedb9 to 1053b9d Compare June 26, 2026 16:12
@donaldgray donaldgray changed the title Initial commit adding the ability to use TextServices Accept "pipeline" to generate SearchService2 on sync processed Manifests Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manifests not requiring IIIF-CS ingestion

2 participants